학습목표

  1. Dataframe에 새로운 colum을 추가하기
  2. Dataframe에 column 삭제하기
In [1]:
import pandas as pd
In [2]:
# data 출처: https://www.kaggle.com/hesh97/titanicdataset-traincsv/data
train_data = pd.read_csv('./train.csv')
train_data.head()
Out[2]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S

새 column 추가하기

  • [] 사용하여 추가하기
  • insert 함수 사용하여 원하는 위치에 추가하기
In [4]:
train_data['Age_double'] = train_data['Age'] * 2
train_data.head()
Out[4]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked Age_double
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S 44.0
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C 76.0
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S 52.0
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S 70.0
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S 70.0
In [5]:
train_data['Age_tripple'] = train_data['Age'] + train_data['Age_double']
train_data.head()
Out[5]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked Age_double Age_tripple
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S 44.0 66.0
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C 76.0 114.0
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S 52.0 78.0
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S 70.0 105.0
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S 70.0 105.0
In [8]:
train_data.insert(3,  'Fare10', train_data['Fare'] / 10)
#칼럼 3에, Fare10이라는 칼럼을, Fare의 10이라는 value를 넣어라
train_data.head()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
c:\Repository\StudyPython_2022_intermediate\pandas\14. DataFrame에 column(컬럼) 추가 _ 삭제하기_Before.ipynb Cell 7' in <cell line: 1>()
----> <a href='vscode-notebook-cell:/c%3A/Repository/StudyPython_2022_intermediate/pandas/14.%20DataFrame%EC%97%90%20column%28%EC%BB%AC%EB%9F%BC%29%20%EC%B6%94%EA%B0%80%20_%20%EC%82%AD%EC%A0%9C%ED%95%98%EA%B8%B0_Before.ipynb#ch0000006?line=0'>1</a> train_data.insert(3,  'Fare10', train_data['Fare'] / 10)
      <a href='vscode-notebook-cell:/c%3A/Repository/StudyPython_2022_intermediate/pandas/14.%20DataFrame%EC%97%90%20column%28%EC%BB%AC%EB%9F%BC%29%20%EC%B6%94%EA%B0%80%20_%20%EC%82%AD%EC%A0%9C%ED%95%98%EA%B8%B0_Before.ipynb#ch0000006?line=1'>2</a> #칼럼 3에, Fare10이라는 칼럼을, Fare의 10이라는 value를 넣어라
      <a href='vscode-notebook-cell:/c%3A/Repository/StudyPython_2022_intermediate/pandas/14.%20DataFrame%EC%97%90%20column%28%EC%BB%AC%EB%9F%BC%29%20%EC%B6%94%EA%B0%80%20_%20%EC%82%AD%EC%A0%9C%ED%95%98%EA%B8%B0_Before.ipynb#ch0000006?line=2'>3</a> train_data.head()

File c:\Repository\StudyPython_2022_intermediate\venv\lib\site-packages\pandas\core\frame.py:4440, in DataFrame.insert(self, loc, column, value, allow_duplicates)
   4434     raise ValueError(
   4435         "Cannot specify 'allow_duplicates=True' when "
   4436         "'self.flags.allows_duplicate_labels' is False."
   4437     )
   4438 if not allow_duplicates and column in self.columns:
   4439     # Should this be a different kind of error??
-> 4440     raise ValueError(f"cannot insert {column}, already exists")
   4441 if not isinstance(loc, int):
   4442     raise TypeError("loc must be int")

ValueError: cannot insert Fare10, already exists

column 삭제하기

  • drop 함수 사용하여 삭제
    • 리스트를 사용하여 멀티플 삭제 가능
In [15]:
df3=train_data.drop('Age_tripple', axis=1) #(inplace = True)
# 함수에 따라서 가로가 기준 축일수도 세로가 기준 축일수도 있다.
# 따라서 Row를 지울지 Column을 지울지 꼭 지정해줘야 한다.
# 0= row, 1= columns
# 3차원으로 바뀌면 0 , 1, 2로 바뀔수도?
df3.head()
Out[15]:
PassengerId Survived Pclass Fare10 Name Sex Age SibSp Parch Ticket Fare Cabin Embarked Age_double
0 1 0 3 0.72500 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S 44.0
1 2 1 1 7.12833 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C 76.0
2 3 1 3 0.79250 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S 52.0
3 4 1 1 5.31000 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S 70.0
4 5 0 3 0.80500 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S 70.0
In [17]:
df4=train_data.drop(['Age_tripple', 'Age_double'], axis=1) #(inplace = True)
#여러개의 조건을 넣고 싶으면 []리스트 형태로 묶어라
df4.head()
Out[17]:
PassengerId Survived Pclass Fare10 Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 0.72500 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 7.12833 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 0.79250 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 5.31000 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 0.80500 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
In [ ]: